Sifting Truths from Multiple Low-Quality Data Sources
نویسندگان
چکیده
In this paper, we study the problem of assessing the quality of co-reference tuples extracted from multiple low-quality data sources and finding true values from them. It is a critical part of an effective data integration solution. In order to solve this problem, we first propose a model to specify the tuple quality. Then we present a framework to infer the tuple quality based on the concept of quality predicates. In particular, we propose an algorithm underlying the framework to find true values for each attribute. Last, we have conducted extensive experiments on real-life data to verify the effectiveness and efficiency of our methods.
منابع مشابه
Discovering Multiple Truths with a Hybrid Model
Many data management applications require integrating information from multiple sources. The sources may not be accurate and provide erroneous values. We thus have to identify the true values from conflicting observations made by the sources. The problem is further complicated when there may exist multiple truths (e.g., a book written by several authors). In this paper we propose a model called...
متن کاملLower Bound Sifting for MDDs
Decision Diagrams (DDs) are a data structure for the representation and manipulation of discrete logic functions often applied in VLSI CAD. Common DDs to represent Boolean functions are Binary Decision Diagrams (BDDs). Multiple-valued logic functions can be represented by Multiple-valued Decision Diagrams (MDDs). The efficiency of a DD representation strongly depends on the variable ordering; t...
متن کاملDomain-Aware Multi-Truth Discovery from Conflicting Sources
In the Big Data era, truth discovery has served as a promising technique to solve conflicts in the facts provided by numerous data sources. The most significant challenge for this task is to estimate source reliability and select the answers supported by high quality sources. However, existing works assume that one data source has the same reliability on any kinds of entity, ignoring the possib...
متن کاملEvaluation of Image Segmentation Quality by Adaptive Ground Truth Composition
Segmenting an image is an important step in many computer vision applications. However, image segmentation evaluation is far from being well-studied in contrast to the extensive studies on image segmentation algorithms. In this paper, we propose a framework to quantitatively evaluate the quality of a given segmentation with multiple ground truth segmentations. Instead of comparing directly the ...
متن کاملAugmented Sifting of Multiple-Valued Decision Diagrams
Discrete functions are now commonly represented by binary (BDD) and multiple-valued (MDD) decision diagrams. Sifting is an effective heuristic technique which applies adjacent variable interchanges to find a good variable ordering to reduce the size of a BDD or MDD. Linear sifting is an extension of BDD sifting where XOR operations involving adjacent variable pairs augment adjacent variable int...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017